1 Liver Integration Methods

2 Objective

I will look into different integration methods to asses batch effects between the liver 20Jan21 sample and the two Liver 2Jul21 samples

3 Analysis

3.1 Original without any special integration

First This is how the UMAP looks without any kind of Integration

Dll4/Myc KO and the triple mutant look a bit further and that can be due to the lower sequencing depth these two conditions have

You can clearly see that there is a pattern of higher expressing cells going to the right, and these two conditions cluster to the left. So there may be some technical biases to adress

3.2 Liver Integration 2 weeks. Adding Ctrl 4 days to look for batch effects

A first Control analysis will be to add the Control(4d) cells into this whole dataset and see how they cluster, if the go on top of Ctrl 2 week cells or not.

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7670
## Number of edges: 287676
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8948
## Number of communities: 7
## Elapsed time: 1 seconds

Now if we look at how the different Conditions are distributed, we can see that Control(4d) does not overlap with Control 2 weeks. We have a batch effect here, mainly because of the sequencing depth issues.

3.3 Integrating by Conditions

First I will try doing the integration considering all 7 conditions. I will use a standard Seurat method for that.

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7670
## Number of edges: 233295
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8820
## Number of communities: 8
## Elapsed time: 0 seconds

The Integration by Condition is way too harsh. Doing it by Condition is not the way to go, I will have to do it by Seq Depth

That means that I will split this big dataset into two, the old sequencing with high depth, and new sequencing with low depth.

For that I will create a new category “SeqDepth” and split this dataset into “High” and “Low”

3.4 Integrating by Sequencing Depth

I created a new category “SeqDepth” and split this dataset into “High” and “Low”. Then I ran an Integration in terms of those two categories

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7670
## Number of edges: 294710
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9004
## Number of communities: 7
## Elapsed time: 0 seconds

It works much much better this way, however we lose again AV zonation

Now that we see that this is the strategy to go, I will do the Integration in terms of Sequencing Depth but taking out the Control(4d) group

4 Conclusion

Integration must be done in terms of Sequencing Depth, not Condition, as it better addresses batch effects

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] patchwork_1.1.1    yaml_2.2.1         rmarkdown_2.11     dplyr_1.0.7        ggplot2_3.3.5      SeuratObject_4.0.4 Seurat_4.0.5       knitr_1.36         BiocStyle_2.18.1  
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15            colorspace_2.0-2      deldir_1.0-6          ellipsis_0.3.2        ggridges_0.5.3        spatstat.data_2.1-0   farver_2.1.0          leiden_0.3.9          listenv_0.8.0         ggrepel_0.9.1         RSpectra_0.16-0       fansi_0.5.0           codetools_0.2-18      splines_4.0.3         polyclip_1.10-0       jsonlite_1.7.2        ica_1.0-2             cluster_2.1.2         png_0.1-7             uwot_0.1.11           shiny_1.7.1           sctransform_0.3.2     spatstat.sparse_2.0-0 BiocManager_1.30.16   compiler_4.0.3        httr_1.4.2            Matrix_1.3-4          fastmap_1.1.0         lazyeval_0.2.2        later_1.3.0           htmltools_0.5.2       tools_4.0.3           igraph_1.2.9          gtable_0.3.0          glue_1.5.1            RANN_2.6.1            reshape2_1.4.4        Rcpp_1.0.7            scattermore_0.7       jquerylib_0.1.4       vctrs_0.3.8           nlme_3.1-153          lmtest_0.9-39         xfun_0.26             stringr_1.4.0         globals_0.14.0        mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1       irlba_2.3.3           goftest_1.2-3         future_1.23.0         MASS_7.3-54           zoo_1.8-9             scales_1.1.1          spatstat.core_2.3-2   promises_1.2.0.1      spatstat.utils_2.2-0  parallel_4.0.3        RColorBrewer_1.1-2    reticulate_1.22       pbapply_1.5-0         gridExtra_2.3         sass_0.4.0            rpart_4.1-15          stringi_1.7.6         highr_0.9             rlang_0.4.12          pkgconfig_2.0.3       matrixStats_0.61.0    evaluate_0.14         lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4           tensor_1.5            labeling_0.4.2        htmlwidgets_1.5.4     cowplot_1.1.1         tidyselect_1.1.1      parallelly_1.29.0     RcppAnnoy_0.0.19      plyr_1.8.6            magrittr_2.0.1        bookdown_0.24         R6_2.5.1              generics_0.1.1        withr_2.4.3           pillar_1.6.4          mgcv_1.8-38           fitdistrplus_1.1-6    survival_3.2-13       abind_1.4-5           tibble_3.1.6          future.apply_1.8.1    crayon_1.4.2          KernSmooth_2.23-20    utf8_1.2.2            spatstat.geom_2.3-0   plotly_4.10.0         grid_4.0.3            data.table_1.14.2     digest_0.6.29         xtable_1.8-4          tidyr_1.1.4           httpuv_1.6.3          munsell_0.5.0         viridisLite_0.4.0     bslib_0.3.1